21 research outputs found

    Significance-Based Categorical Data Clustering

    Full text link
    Although numerous algorithms have been proposed to solve the categorical data clustering problem, how to access the statistical significance of a set of categorical clusters remains unaddressed. To fulfill this void, we employ the likelihood ratio test to derive a test statistic that can serve as a significance-based objective function in categorical data clustering. Consequently, a new clustering algorithm is proposed in which the significance-based objective function is optimized via a Monte Carlo search procedure. As a by-product, we can further calculate an empirical pp-value to assess the statistical significance of a set of clusters and develop an improved gap statistic for estimating the cluster number. Extensive experimental studies suggest that our method is able to achieve comparable performance to state-of-the-art categorical data clustering algorithms. Moreover, the effectiveness of such a significance-based formulation on statistical cluster validation and cluster number estimation is demonstrated through comprehensive empirical results.Comment: 36 pages, 6 figure

    A testing-based approach to assess the clusterability of categorical data

    Full text link
    The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical pp-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly correlated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for pp-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner.Comment: 19 pages, 13 figure

    Interpretable Sequence Clustering

    Full text link
    Categorical sequence clustering plays a crucial role in various fields, but the lack of interpretability in cluster assignments poses significant challenges. Sequences inherently lack explicit features, and existing sequence clustering algorithms heavily rely on complex representations, making it difficult to explain their results. To address this issue, we propose a method called Interpretable Sequence Clustering Tree (ISCT), which combines sequential patterns with a concise and interpretable tree structure. ISCT leverages k-1 patterns to generate k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed. More precisely, ISCT first projects sequences into random subspaces and then utilizes the k-means algorithm to obtain high-quality initial cluster assignments. Subsequently, it constructs a pattern-based decision tree using a boosting-based construction strategy in which sequences are re-projected and re-clustered at each node before mining the top-1 discriminative splitting pattern. Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure while delivering fast and accurate cluster assignments.Comment: 11 pages, 6 figure

    AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition

    Full text link
    Raw videos have been proven to own considerable feature redundancy where in many cases only a portion of frames can already meet the requirements for accurate recognition. In this paper, we are interested in whether such redundancy can be effectively leveraged to facilitate efficient inference in continuous sign language recognition (CSLR). We propose a novel adaptive model (AdaBrowse) to dynamically select a most informative subsequence from input video sequences by modelling this problem as a sequential decision task. In specific, we first utilize a lightweight network to quickly scan input videos to extract coarse features. Then these features are fed into a policy network to intelligently select a subsequence to process. The corresponding subsequence is finally inferred by a normal CSLR model for sentence prediction. As only a portion of frames are processed in this procedure, the total computations can be considerably saved. Besides temporal redundancy, we are also interested in whether the inherent spatial redundancy can be seamlessly integrated together to achieve further efficiency, i.e., dynamically selecting a lowest input resolution for each sample, whose model is referred to as AdaBrowse+. Extensive experimental results on four large-scale CSLR datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with 1.44×\times throughput and 2.12×\times fewer FLOPs. Comparisons with other commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness of AdaBrowse. Code is available at \url{https://github.com/hulianyuyy/AdaBrowse}.Comment: ACMMM202

    Self-Emphasizing Network for Continuous Sign Language Recognition

    No full text
    Hand and face play an important role in expressing sign language. Their features are usually especially leveraged to improve system performance. However, to effectively extract visual representations and capture trajectories for hands and face, previous methods always come at high computations with increased training complexity. They usually employ extra heavy pose-estimation networks to locate human body keypoints or rely on additional pre-extracted heatmaps for supervision. To relieve this problem, we propose a self-emphasizing network (SEN) to emphasize informative spatial regions in a self-motivated way, with few extra computations and without additional expensive supervision. Specifically, SEN first employs a lightweight subnetwork to incorporate local spatial-temporal features to identify informative regions, and then dynamically augment original features via attention maps. It's also observed that not all frames contribute equally to recognition. We present a temporal self-emphasizing module to adaptively emphasize those discriminative frames and suppress redundant ones. A comprehensive comparison with previous methods equipped with hand and face features demonstrates the superiority of our method, even though they always require huge computations and rely on expensive extra supervision. Remarkably, with few extra computations, SEN achieves new state-of-the-art accuracy on four large-scale datasets, PHOENIX14, PHOENIX14-T, CSL-Daily, and CSL. Visualizations verify the effects of SEN on emphasizing informative spatial and temporal features. Code is available at https://github.com/hulianyuyy/SEN_CSL

    Mapping of quantitative trait loci for growth traits around the first overwintering period in Songpu mirror carp (Cyprinus carpio L.) cultured in Northeast China

    No full text
    QTL mapping studies of growth traits based on high-density linkage maps in Northeast China has great significance for local enterprises in achieving MAS and improving the selection accuracy of parent fish. Here, we constructed a high-density genetic linkage map with 11,445 single nucleotide polymorphism (SNP) markers in a full-sib F1 family consisting of 120 progenies in Songpu mirror carp (Cyprinus carpio) reared in Northeast China. The consensus map covered 5471.93 centimorgans (cM) across the 50 linkage groups with an average resolution of 0.54 cM. Around the first overwintering period, a total of 15 QTLs for growth traits were identified on five LGs (LG7, LG8, LG14, LG30, and LG41). All of them were responsible for body weight (BW) at both 150 (before overwintering) and 350 (after overwintering) days, explaining 9.8–17.9 % of the phenotypic variation (PV). Two major-effect QTLs were detected on LG30, which explained 16.5 % and 16.4 % (qTL9–30), and 15.8 % and 17.9 % (qTL10–30) of the PV at 150 and 350 days, respectively. In addition, 15 loci were identified for 4 growth traits other than BW at both time points, including 6 loci for body length (BL), 7 loci for body height (BH), 3 loci for body thickness (BT) and 5 loci for head length (HL). These loci explained PV ranging from 15.9 % to 21.2 % for BL, 9.3–19.6 % for BH, 9.6–15.5 % for BT, and 10.6–18.1 % for HL. Nine candidate genes (IGF1b, LEPb, PACAP1a, GHSR1a, PPSS2, IRS1, APOAIb, IRS2b and ADSS1) were identified, five of which were directly related to growth hormone (GH), including IGF1b, PACAP1a, GHSR1a, PPSS2 and IRS1. Notably, three genes (GHSR1a, PPSS2 and IRS1) were derived from two major-effect QTLs (qTL9–30 and qTL10–30). In conclusion, these novel findings may shed light on early selection of growth traits of common carp cultured in Northeast China and other relatively high-latitude regions

    Yishen-tongbi decoction inhibits excessive activation of B cells by activating the FcγRIIb/Lyn/SHP-1 pathway and attenuates the inflammatory response in CIA rats

    No full text
    Rheumatoid arthritis (RA) is a chronic autoimmune disease. Strong evidence supports that excessive activation of B cells plays a critical role in the pathogenesis of RA. Fc gamma receptor b (FcγRIIb) is the B cell inhibitory receptor and inhibits BCR (B cell receptor) signalling in part by selectively dephosphorylating CD19 which is considered a co-receptor for BCR and is essential for B cell activation. Our previous study demonstrated that a FcγRIIb I232T polymorphism presented a strong genetic link to RA and may lead to the excessive activation of B cells. Therefore, novel therapeutic strategies and drugs that can effectively inhibit the excessive activation of B cells by regulating the FcγRIIb are necessary for the treatment of RA. Therefore, we used Burkitt’s lymphoma ST486 human B cells (lacking endogenous FcγRIIb) transfected with the 232Thr loss-of-function mutant to construct a FcγRIIb mutant cell line (ST486), and we demonstrated that YSTB treatment not only reduced proliferation and promoted apoptosis in ST486 cells but also did so in a dose-dependent manner. Furthermore, the intracellular Ca2+ flux of ST486 cells was decreased after treatment with YSTB, inhibiting the excessive activation of ST486 cells, and these effects correlated with the CD19/FcγRIIb-Lyn-SHP-1 pathways. Our data showed that YSTB treatment inhibited the expression of phosphorylated CD19 and upregulated the protein expression of FcγRIIb, Lyn, and SHP-1. Additionally, the CIA model was established to explore the anti-inflammatory and inhibitory effects of YSTB on bone destruction, and we found that YSTB decreased the paw oedema and arthritis index (AI) in CIA rats. It is worth mentioning that YSTB clearly decreased the AI earlier than methotrexate (MTX) (day 10 vs 16). Moreover, synovial hyperplasia, inflammatory cell infiltration and cartilage surface erosion in CIA rats were noticeably reduced after treatment with YSTB as evidenced by histopathological examination. Finally, we found that YSTB treatment suppressed bone erosion and joint space score (JNS) in CIA rats as evidenced by radiographic assessment. In summary, these data suggest that YSTB has great therapeutic potential for RA treatment

    Genetic Differentiation of an Endangered Megalobrama terminalis Population in the Heilong River within the Genus Megalobrama

    No full text
    Megalobrama terminalis, which inhabits the Sino-Russian Heilong-Amur River Basin, has decreased critically since the 1960s. It has been listed in the Red Book of Endangered Fish Species by the Russian Federation in 2004. To guide the utilization and conservation programs of M. terminalis in the Heilong River (MTH), 3.1 kb of mitochondrial DNA (mtDNA) concatenated sequences and sequence-related amplified polymorphism (SRAP) markers (15 primer combinations) were applied to explore the genetic divergence and population differentiation of MTH within the genus Megalobrama. Clear genetic divergence between MTH and six other populations of the genus Megalobrama was found by haplotype network (mtDNA) and principal component (SRAP) analyses. Moreover, the STRUCTURE analysis based on SRAP data showed that MTH could be assigned to a particular cluster, whereas conspecific M. terminalis in the Qiantang River and Jinsha River Reservoir belonged to the same cluster. Analysis of molecular variance (AMOVA) and Fst statistics for the mtDNA and SRAP data revealed significant genetic variance and differentiation among all detected populations. Taken together, the results suggest that MTH has a strong genetic differentiation from other populations within the genus Megalobrama, which contributes to effective utilization in artificial cultivation and breeding of MTH. Furthermore, these results also provide a scientific basis for the management of MTH as a separate conservation unit

    Plant Traits Guide Species Selection in Vegetation Restoration for Soil and Water Conservation

    No full text
    Great efforts have been made to improve the soil and water conservation capacity by restoring plant communities in different climatic and land-use types. However, how to select suitable species from local species pools that not only adapt to different site environments, but also achieve certain soil and water conservation capacities is a great challenge in vegetation restoration for practitioners and scientists. So far, little attention has been paid to plant functional response and effect traits related to environment resource and ecosystem functions. In this study, together with soil properties and ecohydrological functions, we measured the seven plant functional traits for the most common species in different restoration communities in a subtropical mountain ecosystem. Multivariate optimization analyses were performed to identify the functional effect types and functional response types based on specific plant traits. We found that the community-weighted means of traits differed significantly among the four community types, and the plant functional traits were strongly linked with soil physicochemical properties and ecohydrological functions. Based on three optimal effect traits (specific leaf area, leaf size, and specific root length) and two response traits (specific leaf area and leaf nitrogen concentration), seven functional effect types in relation to the soil and water conservation capacity (interception of canopy and stemflow, maximum water-holding capacity of litter, maximum water-holding capacity of soil, soil surface runoff, and soil erosion) and two plant functional response types to soil physicochemical properties were identified. The redundancy analysis showed that the sum of all canonical eigenvalues only accounted for 21.6% of the variation in functional response types, which suggests that community effects on soil and water conservation cannot explain the overall structure of community responses related to soil resources. The eight overlapping species between the plant functional response types and functional effect types were ultimately selected as the key species for vegetation restoration. Based on the above results, we offer an ecological basis for choosing the appropriate species based on functional traits, which may be very helpful for practitioners involved in ecological restoration and management
    corecore